home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group93a.txt
/
000003_icon-group-sender _Sun Jan 3 21:17:57 1993.msg
< prev
next >
Wrap
Internet Message Format
|
1993-04-21
|
34KB
Received: by cheltenham.cs.arizona.edu; Mon, 4 Jan 1993 04:34:12 MST
Date: 3 Jan 93 21:17:57 GMT
From: agate!spool.mu.edu!olivea!pagesat!spssig.spss.com!uchinews!ellis!goer@ucbvax.Berkeley.EDU (Richard L. Goerwitz)
Organization: University of Chicago
Subject: parser generator, part 1
Message-Id: <1993Jan3.211757.28395@midway.uchicago.edu>
Sender: icon-group-request@cs.arizona.edu
To: icon-group@cs.arizona.edu
Status: R
Errors-To: icon-group-errors@cs.arizona.edu
Warning -
Don't bother unpacking this file unless a) you know what a parser
generator is, and b) you are interested in thinking about how one
should look for Icon.
-Richard Goerwitz
---- Cut Here and feed the following to sh ----
#!/bin/sh
# This is a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 01/03/1993 20:21 UTC by richard@zenu
# Source directory /u/richard/Ibpag
#
# existing files will NOT be overwritten unless -c is specified
# This format requires very little intelligence at unshar time.
# "if test", "cat", "rm", "echo", "true", and "sed" may be needed.
#
# This is part 1 of a multipart archive
# do not concatenate these parts, unpack them in order with /bin/sh
#
# This shar contains:
# length mode name
# ------ ---------- ------------------------------------------
# 41902 -rw-r--r-- ibpag.icn
# 18563 -r--r--r-- maketbls.icn
# 9627 -r--r--r-- preproc.icn
# 25415 -r--r--r-- itokens.icn
# 3663 -r--r--r-- debugme.icn
# 1190 -r--r--r-- errors.icn
# 2062 -r--r--r-- slashupto.icn
# 4314 -r--r--r-- rewrap.icn
# 762 -r--r--r-- strip.icn
# 1752 -rw-r--r-- Makefile.dist
#
if test -r _shar_seq_.tmp; then
echo 'Must unpack archives in sequence!'
echo Please unpack part `cat _shar_seq_.tmp` next
exit 1
fi
# ============= ibpag.icn ==============
if test -f 'ibpag.icn' -a X"$1" != X"-c"; then
echo 'x - skipping ibpag.icn (File already exists)'
rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting ibpag.icn (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'ibpag.icn' &&
X############################################################################
X#
X# Name: ibpag.src
X#
X# Title: Icon-based parser generator
X#
X# Author: Richard L. Goerwitz
X#
X# Version: 1.21
X#
X############################################################################
X#
X#
X# General Description:
X#
X# IBPAG is a simple tool for generating parsers from grammar
X# specifications. Sounds pretty mundane, but those who have never
X# used a parser generator will perhaps be surprised to find that this
X# kind of tool forms the basis of most modern programming language
X# implementations. Parser generators are also used in preprocessors,
X# transducers, compilers, interpreters, calculators and in fact for
X# just about any situation where some form of structured input needs
X# to be converted into some form of structured output.
X#
X# IBPAG is not a full, working system. In particular, startup
X# times for compiled IBPAG programs is very slow, since I can't find
X# a good way of writing/reconstructing the necessary structures. For
X# applications where a startup delay is not a problem, IBPAG will
X# work fine. What IBPAG is supposed to be is a very preliminary
X# crack at designing a parser generator for Icon. It really ought to
X# be redone in C some day, and its error handling facilities need to
X# be tested better. It also ought to be redone as an LALR (and not
X# LR) parser, and some method of storing parse tables ought to be
X# found that permits fast/easy reconstruction at run-time. This
X# version has to be regarded as in beta testing. I doubt I'll ever
X# take it, however, much further - at least in its present form.
X#
X#
X# More technical description:
X#
X# Technically speaking, IBPAG is a preprocessor that accepts a
X# Icon-like source file from the standard input containing grammar
X# productions and actions, converts them into parsing tables and
X# associated code, and then adds to this an LR parser, tokenizer,
X# error handler, and a few debugging tools. Writes the combination
X# to the standard output, along with the necessary action and goto
X# tables. The tables are generated using a very general (but also
X# very sluggish) LR(1) algorithm. IBPAG is not terribly practical
X# for most micros. It was written as an excercize by me (a linguist
X# with far too little CS training) while auditing a compiler course
X# as a some-time diversion from many dissertation rewrites...
X#
X# Cycles and epsilon moves are handled correctly (to my
X# knowledge). Shift-reduce conflicts are handled in the normal way
X# (i.e. pick the rule with the highest priority, and, in cases where
X# the priority is the same, check the associativities; flag
X# reduce/reduce conflicts as errors, and for shift/reduce conflicts
X# pick shift by default [shift/shift conflicts don't matter, since we
X# shift in either case, and no reduction takes place]).
X#
X#
X# Running IBPAG:
X#
X# Invoking IBPAG is very, very simple. There are just two
X# (optional) command-line switches:
X#
X# ibpag [-d] [-v] < inputfile > outputfile
X#
X# Where the -d switch causes profuse debugging messages to be
X# displayed, and the -v switch ("verbose") causes somewhat more
X# restrained messages to be emitted. -d implies -v. Inputfile is an
X# IBPAG source file (see below on its format). Outputfile is an LR
X# parser with appropriate tables, generated from the specs contained
X# in inputfile.
X#
X# For more information how to set up IBPAG files and incorporate
X# them into larger Icon programs, see the section just below, "How to
X# use IBPAG."
X#
X############################################################################
X#
X#
X# How to use IBPAG
X# ________________
X#
X#
X# I. Prerequisites
X#
X# The only prerequisites to using IBPAG are a knowledge of how to
X# write a basic LR-parsable context-free grammar and a familiarity
X# with the Icon programming language. IBPAG source files use
X# essentially the same syntax as Icon source files. In a couple of
X# cases, though the semantics of are quite different. IBPAG is
X# essentially just a preprocessor. IBPAG-specific constructs,
X# therefore, should be seen, not as Icon expressions, but as
X# directives to the preprocessor, specifying how to construct the
X# actual output code.
X#
X# In the sections that follow, I will discuss how to set up an
X# IBPAG file, i.e. how to declare a start symbol and write rules.
X# The issue of the tokenizer will then be addressed, along with a
X# treatment of epsilon, and two reserved tokens: $ and error.
X# Finally the overall compatibility of IBPAG and Icon code will be
X# discussed, as well as how to start the IBPAG parser from the Icon
X# code. If you still have problems and/or questions, try reading the
X# UNIX manual page on YACC. IBPAG and YACC have some superficial
X# similarities that may be helpful in sorting out why IBPAG works the
X# way it does. Alternatively write to me, Richard Goerwitz, via
X# e-mail at goer@midway.uchicago.edu. I'll be glad to field
X# questions and provide anyone with the latest version I have
X# on-hand. Doubtless there will be many bugs to find, and I'll be
X# happy to fix any that get reported. At some point, the entire
X# table generating process (in maketbls.icn) will need to be ripped
X# out, and replaced with one of the new, faster LR(1) (or perhaps
X# LALR) algorithms.
X#
X#
X# II. The start symbol
X#
X# All context-free grammars have a start-symbol, i.e. a terminal
X# that can be considered the "goal" of a parse. Because cycles (and
X# bugs) in the grammar can make it hard for IBPAG to determine what
X# the start symbol is, it must be declared. This is done with a
X# start-symbol declaration, which has the following form:
X#
X# <start-symbol-declaration> ::= start_symbol identifier
X#
X# If no start-symbol declaration appears in an IBPAG source file,
X# IBPAG assumes S as the start symbol.
X#
X#
X# III. Rule declarations
X#
X# The heart of the IBPAG source file is the rule declaration.
X# Basically, rule declarations tell IBPAG the structure of the grammar
X# it is to assume when generating a parser. They also tell it what
X# rules to associate with what Icon code. Rule declarations look a
X# lot like procedure declarations, except that they begin with "rule"
X# instead of "procedure" and must specify a priority and
X# associativity. For those with formalistic leanings, the format of
X# the rule declaration is as follows (opt = optional):
X#
X# <rule-declaration> ::= <rule-header> ... end
X# <rule-header> ::= rule <priority> <associativity> \
X# <left-hand-symbol> ( <right-hand-symbol-list> )
X# <priority> ::= integer-literal (opt)
X# <associativity> ::= left | right | none (opt)
X# <left-hand-side> ::= identifier
X# <right-hand-side> ::= <symbol-spec> | <symbol-spec>,<right-hand-side>
X# <symbol-spec> ::= <symbol> | <symbol> \| <symbol-spec>
X# <symbol> ::= identifier | string-literal
X#
X# The "..." indicates material that is precisely the same as for a
X# standard Icon procedure declaration. The only difference between a
X# rule and a procedure (syntactically, that is) is in the header.
X# Superficially, rule and procedure headers look a lot alike. The
X# rule headers, though, have extra elements (optional priority and
X# associativity). They also interpret arguments in the option list
X# quite differently than Icon does. In particular, these arguments
X# are either strings, identifiers, or any combination of these
X# separated by a bar. Strings correspond to terminals, identifiers
X# to nonterminals. The bar will be discussed below.
X#
X# For those with a more practical bent, here is a sample rule
X# definition:
X#
X# rule 1 none S(NP, VP)
X# return ["S", arg1, arg2]
X# end
X#
X# The 1 denotes the priority (here very low), while "none" specifies
X# the associativity (can be right, left, or none). S is the
X# left-hand side of the rule, and NP and VP are the (nonterminal)
X# symbols that make up its right-hand side. Normally, the priority
X# is not needed (the default is 1). In fact, it is best to try to
X# write grammars without any priorities or associativity rules. The
X# default associativity is none.
X#
X# Please note the body of the rule itself. The code there is
X# triggered whenever an S is found (i.e. when a reduction via the
X# above S-rule occurs during the parse). For those who know YACC,
X# arg1 is equivalent to $1 and the return expression is equivalent to
X# $$ =. Essentially, the above rule definition says that an S is
X# composed of an NP and a VP, and that if an S is encountered, the
X# parser is to create a list having three members, the first being
X# the string "S", the second being the result that was returned when
X# the NP was found, and the third being the result returned when the
X# VP was encountered. Note that the above rule implies the existence
X# of at least two other rules having NP and VP, respectively, as
X# their left-hand sides. For example:
X#
X# rule NP("det", "noun")
X# return ["NP", arg1, arg2]
X# end
X#
X# rule VP("v", NP)
X# return ["VP", arg1, arg2]
X# end
X#
X# If two or more right-hand sides differ in just a single symbol,
X# it is acceptable to combine them into one declaration using a
X# vertical slash or bar. Although this bar looks like the usual Icon
X# alternation operator, in this context IBPAG uses it as a macro
X# device. In the following case, two rules are generated, each with
X# one of the alternatives, "that" or "which," in its RHS. The same
X# code is triggered by a reduction via either rule:
X#
X# rule RELCL("that"|"which", S)
X# return ["RELCL", arg1, arg2]
X# end
X#
X#
X# IV. The tokenizer
X#
X# Note that in the above examples, "v," "det," "noun," "that", and
X# "which" are considered terminal symbols. The user must, in fact,
X# create a lexical analyzer called iparse_tokens that returns these
X# symbols. The precise form in which iparse_tokens must function is
X# as follows:
X#
X# iparse_tokens: file -> TOK records
X# (stream) -> Ts
X# Where stream is an open file, and Ts are TOK records.
X# As a side-effect, iparse_tokens should increment the global
X# variable line_number (once right away, then once for each
X# newline encountered, or anything equivalent thereto)
X#
X# The -> notation above implies that iparse_tokens is a pure
X# function. I just can't think of a good notation for generators.
X# Iparse_tokens in fact *must be* a generator. TOK records contain
X# the terminal symbol names in their first field. If the literal
X# string value of the symbols is to be used by any of the rule code,
X# it should be stored in the second field of the TOK record. A
X# typical value returned might be:
X#
X# TOK("v", "eat")
X#
X# YACC/LEX's heterogenous yylval/return (TOKEN) strategy is really
X# quite inelegant, and although my system isn't perfect, it avoids
X# the global variable, and hence makes running simultaneous parsers a
X# bit simpler undertaking (the only globals to worry about in IBPAG
X# are line_number, errors, and fail_on_error, all of which can, if
X# desired, be ignored).
X#
X# If you want line numbers to be output with any error messages
X# that the parser might report, then make sure that the tokenizer
X# keeps track of how many lines it has read by incrementing
X# line_number. The parser itself automatically initializes this
X# (global) variable to 0, so all you'll normally have to do is
X# increment it by one for each newline the tokenizer encounters. If
X# you leave line_number alone, the parser simply does not print out
X# line numbers for errors it reports (which is somewhat less helpful,
X# but workable nonetheless).
X#
X#
X# V. Epsilon
X#
X# If needed, epsilon (i.e. the empty string) can be specified by
X# an empty Icon string (e.g. rule REL(""|"that")). "" is a perfectly
X# legal token. It is absolutely vital, however, to note that
X# *epsilon is never pushed onto the value stack*. Hence in the above
X# instance, if arg1 were to be accessed, an error would result if a
X# reduction via REL -> epsilon (rather than by REL -> "that")
X# occurred! For this reason it is wise to keep epsilon isolated
X# (e.g. rule 10 REL("") and rule 11 REL("that"); note that giving
X# both rules an associativity probably isn't necessary here, although
X# giving the second rule the higher priority often *will* be). Note
X# also that the tokenizer must never return an empty string as a
X# token (i.e. TOK("")). Not only will the parser refuse to place it
X# on the value stack, but it will not even recognize it as a valid
X# lookahead symbol (since, by definition, it doesn't contain
X# anything).
X#
X#
X# VI. Reserved tokens ($ and error)
X#
X# Besides epsilon, the only string (qua symbol) the tokenizer
X# cannot return at will is a TOK with either "error" or "$" as its
X# first field (e.g. TOK("$")). The $ symbol denotes end-of-input,
X# and *must* be returned by the tokenizer when the input stream ends
X# (or when, for some other reason, the token stream has stopped). I
X# could have cut the code so that failure or &null signalled the same
X# thing, but I prefer to use failure of the tokenizer to signal an
X# unexpected end of input, and I don't like the inconsistency of
X# having &null be a valid token. The bottom line is that the
X# tokenizer must return TOK("$") on end-of-input. You'll get used to
X# it :-).
X#
X# The "error" terminal is reserved for error productions. If a
X# syntax error occurs, and the token "error" exists in the grammar,
X# the parser will attempt to back up as one state to see if any of
X# them permits "error" as a valid lookahead symbol. If so, the
X# parser jumps to the state that would have resulted if the parser
X# had encountered "error" at that point. What this does is allow us
X# to write error-recovery rules:
X#
X# global error_count
X# rule EXPR("(", EXPR, ")")
X# return arg1
X# end
X# rule REGEXP("(", REGEXP, "error")
X# write(&errout, "unmatched \"(\"; resynchronizing")
X# # Pretend we got a right parenthesis, and continue
X# # processing.
X# return arg1
X# end
X#
X# So what happens if we get an expression without an enclosing right
X# parenthesis? We get an error message, an increment of the error
X# count, and a resynchronized parser that's ready to go (and possibly
X# find more errors). This mechanism is very much like the mechanism
X# YACC uses.
X#
X# Note that IBPAG will only pop one state off of the stack in
X# efforts to find a state for which "error" is a legitimate lookahead
X# token; any more than this, and we could easily back so far out the
X# current production that we would end up causing confusion about
X# just where the error occurred! If you write error productions,
X# make sure that the error terminal is at most one token removed from
X# the spot where you expect errors to occur.
X#
X#
X# VII. IBPAG-Icon code compatibility (plus gotchas)
X#
X# Regular Icon procedures may be freely intermingled with IBPAG
X# rules, so you can include iparse_tokens() right in the same file as
X# your rule definitions. Note that rules, other than the header, in
X# fact, behave somewhat like procedures in the sense that when a rule
X# is triggered, the parser executes the code for that rule just the
X# way code for a procedure is executed when it is invoked. The main
X# difference between Icon procedures and rules is that the rules can
X# only return one value or fail. If you want a rule to be able to
X# produce more than one value, then have it return a coexpression.
X# Failure in a rule signals that the entire RHS of the production
X# should be discarded, and the stack should be returned to the state
X# it was in before that RHS's first symbol was shifted onto the
X# stack. Failure is really just another way of providing the user
X# more control over error handling and recovery. If you have nothing
X# useful to return, but do *not* want to signal an error, BE SURE TO
X# RETURN A NULL DESCRIPTOR ("return &null")!
X#
X#
X# VII. The parser itself
X#
X# In the output file, the parser that IBPAG constructs is called
X# iparse(). You should invoke it somewhere to get the parser
X# started. It takes a single argument: A stream (i.e. an open file).
X# You could pass it a string, theoretically, instead of a file.
X# Iparse() passes its first argument to iparse_tokens(), so be sure
X# iparse_tokens() understands that it is working on a string instead
X# of a file if you choose to do things this way. IBPAG's output is
X# compressed somewhat, but the parser and error handler are human-
X# readable. Please look at them and modify them as needed. Note
X# especially the debugging facilities that are included with them
X# (e.g. dump_lists()).
X#
X#
X# VIII. Miscellaneous
X#
X# As it encounters errors, the parser, iparse(), increments the
X# global variable errors by one. Normally one botched production
X# will cause a series of cascading errors. As a result, the error
X# handler only increments "errors" for the first in any unbroken
X# series of errors. When (and if) the parser manages to
X# resynchronize itself, and the cascade terminates, the errors
X# variable is once again incremented normally, once for each error
X# encountered.
X#
X# If no error processing is desired at all, you may set the
X# global variable fail_on_error to something other than &null. This
X# tells iparser's error handler simply to fail if there are any
X# syntax errors. Note that "error" productions (described above, in
X# the section on reserved tokens) will still work if they have been
X# used in the grammar. So in fact, setting fail_on_errors will cause
X# failure to occur for all errors that cannot be handled by
X# user-defined error productions. What this does is bypass the
X# built-in error detection and resynchronization routines. This may
X# be useful if an IBPAG file is part of a much larger Icon program
X# that needs to have errors registered internally, and not spit out
X# separately to stderr by a renegade parser :-).
X#
X############################################################################
X#
X# Below is a sample IBPAG source file. It's kind of a hack, but
X# I'm sure it will adequately illustrate the basic principles:
X#
X# link radcon
X# global ID_tbl
X#
X# procedure main()
X# iparse(&input)
X# end
X#
X# rule 0 left S(stream)
X# return
X# end
X#
X# rule 1 left stream(calc, stream)
X# return
X# end
X#
X# rule 1 left stream(calc)
X# return
X# end
X#
X# rule 2 none calc("CR")
X# return
X# end
X#
X# rule 3 left calc(expr, "CR")
X# # Radcon can't handle reals, so this doesn't really work.
X# line_number +:= 1
X# return write(radcon(arg1, \(\ID_tbl)["ibase"] | 10,
X# \(\ID_tbl)["obase"] | 10))
X# end
X#
X# rule 4 none expr("ID")
X# if not (return \ID_tbl[arg1])
X# then write(&errout, "uninitialized variable, line ", line_number)
X# fail
X# end
X#
X# rule 6 right expr("ID", "=", expr)
X# initial ID_tbl := table()
X# ID_tbl[arg1] := arg3
X# return arg3
X# end
X#
X# rule 10 left expr(expr, "+", expr)
X# return arg1 + arg3
X# end
X#
X# rule 10 left expr(expr, "-", expr)
X# return arg1 - arg3
X# end
X#
X# rule 11 left expr(expr, "*", expr)
X# return arg1 * arg3
X# end
X#
X# rule 11 left expr(expr, "/", expr)
X# return arg1 / arg3
X# end
X#
X# rule 11 left expr(expr, "%", expr)
X# return arg1 % arg3
X# end
X#
X# rule 12 left expr(expr, "^", expr)
X# return arg1 ^ arg3
X# end
X#
X# rule 50 none expr("(", expr, ")")
X# return arg2
X# end
X#
X# rule 75 right expr("-", expr)
X# return -arg2
X# end
X#
X# rule 75 right expr("+", expr)
X# return +arg2
X# end
X#
X# rule 100 none expr("NUM")
X# if find(".", arg1)
X# then return real(arg1)
X# else return integer(arg1)
X# end
X#
X# #
X# # iparse_tokens: file -> TOK records (a generator)
X# # stream -> tokens
X# #
X# # Where file is an open input stream, and tokens are TOK
X# # records holding both the token type and actual token text.
X# #
X# # TOK records contain two parts, a preterminal symbol (the first
X# # "sym" field), and the actual text of the token ("str"). The
X# # parser above only pays attention to the sym field, although the
X# # user might well want to use the actual text.
X# #
X# procedure iparse_tokens(stream)
X#
X# local token, c, whitespace, operators
X# #global line_number
X# whitespace := '\r\t \n'
X# operators := '+-*/^()='
X#
X# token := ""
X# every c := !(!&input || "\n") do {
X# if not any(whitespace ++ operators, c) then {
X# token ||:= c
X# } else {
X# if integer(token)
X# then suspend TOK("NUM", "" ~== token)
X# else suspend TOK("ID", "" ~== token)
X# if any(operators, c) then
X# suspend TOK(c)
X# else {
X# if c == "\n" then {
X# line_number +:= 1
X# suspend TOK("CR"|"CR")
X# }
X# }
X# token := ""
X# }
X# }
X# if integer(token)
X# then suspend TOK("NUM", "" ~== token)
X# else suspend TOK("ID", "" ~== token)
X# suspend TOK("CR"|"$")
X#
X# end
X#
X############################################################################
X#
X# Links: options.icn
X#
X# See also: maketbls.icn, preproc.icn
X#
X############################################################################
X
Xlink options, structs, ximage
X
Xglobal DEBUG, VERBOSE
X# For other record declarations, see maketbls.icn.
Xrecord RE(state, procname, sym, size)
Xrecord SH(state)
Xrecord AC()
X
Xprocedure main(a)
X
X local f, f2, lname, opttbl, usage, tlst, sym, seen,
X act, elt, size, r, i, k
X # global ptbl, alst, glst
X
X usage := "usage: ibpag [-d] [-v] [-f input-file]"
X
X opttbl := options(a, "dvf:") | stop(usage)
X if DEBUG := \opttbl["d"]
X then VERBOSE := 1
X else VERBOSE := opttbl["v"]
X f := \opttbl["f"] | &input
X f2 := &output
X
X ptbl := table()
X makeptbl(f, f2)
X
X # set up (global) alst and glst (the action and goto lists)
X CONST_TABLE()
X
X #
X # Set up tlst (a set of all tokens used in the grammar), then
X # turn the ACT records into RE, SH, or AC records.
X #
X tlst := set()
X # squeeze a list of rules out of the action list; find the
X # terminals in their right-hand sides
X every sym := !\(\(!(!alst)).by_rule).RHS do
X \sym.terminal & insert(tlst, sym.str)
X insert(tlst, "$")
X
X #
X # Clobber all of the ACT records, and turn them into RE, SH, or AC
X # records. Get rid of structurally identical, structures by
X # making them all point to the same structure.
X #
X if \VERBOSE then
X write(&errout, "Merging duplicate structures in action list...")
X seen := set()
X every i := 1 to *alst do {
X if /alst[i] then next
X if *alst[i] = 0 then {
X alst[i] := &null
X next
X }
X every k := key(alst[i]) do {
X if alst[i][k].str == "reduce" then {
X r := alst[i][k].by_rule
X size := *r.RHS
X # Don't count epsilon moves when calculating sizes!
X every elt := !r.RHS do
X if elt.str == "" & \elt.terminal then size -:= 1
X act := RE(alst[i][k].state, r.procname, r.LHS, size)
X }
X # Don't need to store all this info for shifts or accepts.
X else if alst[i][k].str == "shift"
X then act := SH(alst[i][k].state)
X else act := AC()
X if alst[i][k] := equiv(act, !seen)
X then next
X else {
X alst[i][k] := act
X insert(seen, act)
X }
X }
X # Simple optimization; if alst[i] is structured identically to
X # a previously seen table, use the previously seen one! Note
X # that equiv is more general than Equiv (it can accept sets
X # and tables). See the IPL for equiv(). Equiv() is in
X # debug.src.
X if alst[i] := equiv(alst[i], !seen)
X then next else insert(seen, alst[i])
X }
X
X write(f2, "")
X write(f2, "############################################################################")
X write(f2, "#")
X write(f2, "# The following code has been inserted by IBPAG (Icon-based")
X write(f2, "# Parser Generator). It contains a stack-based LR parser, an")
X write(f2, "# error handler, and a debugging tool. The user must supply a")
X write(f2, "# tokenizing routine called iparse_tokens().")
X write(f2, "#")
X write(f2, "# For a description of IBPAG source-file format, see the")
X write(f2, "# documentation prepended to ibpag.icn.")
X write(f2, "#")
X write(f2, "############################################################################")
X write(f2, "#")
X write(f2, "# See also: ibpag.icn, maketbls.icn, preproc.icn, itokens.icn")
X write(f2, "#")
X write(f2, "############################################################################")
X write(f2, "")
X write(f2, "# uncomment for the compiler")
X write(f2, "# invocable \"symbol\", \"TOK\", \"RE\", \"SH\", \"AC\"")
X write(f2, "")
X write(f2, "# I use ximage for debugging; remove it if you don't need it.")
X write(f2, "link codeobj#, ximage")
X write(f2, "")
X write(f2, "record symbol(str, terminal)")
X write(f2, "record TOK(sym, str)")
X write(f2, "record RE(state, procname, sym, size)")
X write(f2, "record SH(state)")
X write(f2, "record AC()")
X write(f2, "")
X write(f2, "global line_number, errors, fail_on_error")
X write(f2, "")
X write(f2, "#")
X write(f2, "# iparse: file -> ?")
X write(f2, "# stream -> ?")
X write(f2, "#")
X write(f2, "# Where stream is an open file, and ? represents the user-defined")
X write(f2, "# result of a completed parse of file, from the current location")
X write(f2, "# up to the point where the parser executes an \"accept\" action.")
X write(f2, "#")
X write(f2, "# The second to fifth arguments are used on recursive calls from")
X write(f2, "# the error handler, iparse_error. Ignore them, unless you are")
X write(f2, "# sure of what you are doing!")
X write(f2, "#")
X write(f2, "procedure iparse(stream, state_stack, value_stack, next_token, err_state)")
X write(f2, "")
X write(f2, " local start_symbol, token, act, last_symbol, arglist")
X write(f2, " static alst, glst")
X write(f2, " #global line_number, errors")
X write(f2, " initial {")
X every lname := "alst" |"glst" do {
X encode(variable(lname)) ? {
X writes(f2, "\t", lname, " := decode(\"")
X if write(f2, move(47), "_") then {
X while write(f2, "\t ",move(60), "_")
X write(f2, "\t ", tab(0), "\")")
X }
X else write(f2, tab(0), "\")")
X }
X }
X write(f2, "\t#")
X write(f2, "\t# Uncomment the following if you want a look at the state and goto")
X write(f2, "\t# tables. If you aren't planning on looking at them, find the")
X write(f2, "\t# procedure definition for dump_lists below, and delete it. Why")
X write(f2, "\t# keep it around if it's not being used?")
X write(f2, "\t#")
X write(f2, "\t# dump_lists(&errout, alst, glst)")
X write(f2, " }")
X write(f2, " #")
X write(f2, " # New, not recursive, invocation; reset stacks, line number and")
X write(f2, " # error count.")
X write(f2, " #")
X write(f2, " start_symbol := ", image(start_symbol))
X write(f2, " /err_state := 0")
X write(f2, " /state_stack := [1] & line_number := 0 & errors := 0")
X write(f2, " /value_stack := []")
X write(f2, " /next_token := create iparse_tokens(stream)")
X write(f2, "")
X write(f2, " # &trace := -1")
X write(f2, "")
X write(f2, " while token := @next_token do {")
X write(f2, "\trepeat {")
X write(f2, "\t act := \\alst[state_stack[1]][token.sym] | {")
X write(f2, "\t\t#")
X write(f2, "\t\t# You can replace this error handler with whatever you")
X write(f2, "\t\t# like, and have it do whatever you like.")
X write(f2, "\t\t#")
X write(f2, "\t\treturn iparse_error(alst, state_stack, value_stack,")
X write(f2, "\t\t\t\t token, next_token, err_state + 1)")
X write(f2, "\t }")
X write(f2, "")
X write(f2, "\t # If we're not working on an error production, assume that")
X write(f2, "\t # we have a fully (re)synchronized parser, & clear err_state. ")
X write(f2, "\t #")
X write(f2, "\t (\\last_symbol | token.sym) == \"error\" | { err_state := 0 }")
X write(f2, "\t last_symbol := token.sym")
X write(f2, "")
X write(f2, "\t case type(act) of {")
X write(f2, "\t \"SH\" : {")
X write(f2, "\t\t # push the next state onto the state stack")
X write(f2, "\t \t push(state_stack, act.state)")
X write(f2, "\t\t # push the current token's literal value onto the")
X write(f2, "\t\t # value stack")
X write(f2, "\t\t push(value_stack, token.str)")
X write(f2, "\t\t # break out of enclosing repeat loop")
X write(f2, "\t\t break")
X write(f2, "\t }")
X write(f2, "\t \"RE\" : {")
X write(f2, "\t \t arglist := []")
X write(f2, "\t\t #")
X write(f2, "\t\t # Pop as many elements off of the token stack as")
X write(f2, "\t\t # there are symbols in the right-hand side of the")
X write(f2, "\t\t # rule. Push these elements onto an argument list.")
X write(f2, "\t\t #")
X write(f2, "\t\t every 1 to act.size do {")
X write(f2, "\t\t pop(state_stack)")
X write(f2, "\t\t push(arglist, pop(value_stack))")
X write(f2, "\t\t }")
X write(f2, "\t\t #")
X write(f2, "\t\t # Check to goto list to see what state we should")
SHAR_EOF
true || echo 'restore of ibpag.icn failed'
fi
echo 'End of part 1'
echo 'File ibpag.icn is continued in part 2'
echo 2 > _shar_seq_.tmp
exit 0
--
-Richard L. Goerwitz goer%midway@uchicago.bitnet
goer@midway.uchicago.edu rutgers!oddjob!ellis!goer